// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements.  See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License.  You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
= SQL Performance Tuning

This article outlines basic and advanced optimization techniques for Ignite SQL queries. Some of the sections are also useful for debugging and troubleshooting.

== Basic Considerations: Ignite vs RDBMS

Ignite is frequently compared to relational databases for their SQL capabilities with an expectation that existing SQL
queries, created for an RDBMS, will work out of the box and perform faster in Ignite without any
changes. Usually, such an assumption is based on the fact that Ignite stores and processes data in-memory.
However, it's not enough just to put data in RAM and expect an order of magnitude increase in performance. Generally,
extra tuning is required. Below you can see a standard checklist of
best practices to consider before you benchmark Ignite against an RDBMS or do any performance testing:

* Ignite is optimized for _multi-nodes_ deployments with RAM as a primary storage. Don't
try to compare a single-node Ignite cluster to a relational database. You should deploy a multi-node Ignite cluster with the whole copy of data in RAM.

* Be ready to adjust your data model and existing SQL queries.
Use the link:data-modeling/affinity-collocation[affinity colocation] concept during the data
modelling phase for proper data distribution. Remember, it's not enough just to put data in RAM. If your data is properly colocated, you can run SQL queries with JOINs at massive scale and expect significant performance benefits.

* Define secondary indexes and use other standard, and Ignite-specific, tuning techniques described below.

* Keep in mind that relational databases leverage local caching techniques and, depending on the total data size, an
RDBMS can complete _some queries_ even faster than Ignite even in a multi-node configuration.
If your data set is around 10-100GB and an RDBMS has enough RAM for caching data locally than it, for instance, can
outperform a multi-node Ignite cluster because the latter will be utilizing the network. Store much more data in Ignite to see the difference.


== Using the EXPLAIN Statement

Ignite supports the `EXPLAIN` statement which could be used to read the execution plan of a query.
Use this command to analyse your queries for possible optimization. Note that the plan will contain multiple rows: the
last one will contain a query for the reducing side (usually your application), others are for map nodes (usually server nodes).
Read the link:SQL/sql-introduction#distributed-queries[Distributed Queries] section to learn how queries are executed in Ignite.

[source,sql]
----
EXPLAIN SELECT name FROM Person WHERE age = 26;
----

The execution plan is generated by H2 as described link:http://www.h2database.com/html/performance.html#explain_plan[here, window=_blank].

== OR Operator and Selectivity

//*TODO*: is this still valid?

If a query contains an `OR` operator, then indexes may not be used as expected depending on the complexity of the query.
For example, for the query `select name from Person where gender='M' and (age = 20 or age = 30)`, an index on the `gender`
field will be used instead of an index on the `age` field, although the latter is a more selective index.
As a workaround for this issue, you can rewrite the query with `UNION ALL` (notice that `UNION` without `ALL` will return
`DISTINCT` rows, which will change the query semantics and will further penalize your query performance):

[source,sql]
----
SELECT name FROM Person WHERE gender='M' and age = 20
UNION ALL
SELECT name FROM Person WHERE gender='M' and age = 30
----

== Avoid Having Too Many Columns

Avoid having too many columns in the result set of a `SELECT` query. Due to limitations of the H2 query parser, queries
with 100+ columns may perform worse than expected.

== Lazy Loading

By default, Ignite will set Lazy Loading enabled, this will minimize memory consumption at the cost of moderate performance degradation.

Otherwise, Ignite attempts to load the whole result set to memory and send it back to the query initiator (which is usually your application). This approach provides optimal performance for queries of small or medium result sets, and minimizes the duration of internal database locks, thus increasing concurrency. 

WARNING, if the result set is too big to fit in the available memory, it can lead to prolonged GC pauses and even `OutOfMemoryError` exceptions. This property is deprecated since Ignite 2.15.0.

To change the default behavior: 

[tabs]
--

tab:Java[]
[source,java]
----
SqlFieldsQuery query = new SqlFieldsQuery("SELECT * FROM Person WHERE id > 10");

// Result set will be loaded lazily.
query.setLazy(false);
----
tab:JDBC[]
[source,sql]
----
jdbc:ignite:thin://192.168.0.15?lazy=false
----
tab:C#/.NET[]
[source,csharp]
----
var query = new SqlFieldsQuery("SELECT * FROM Person WHERE id > 10")
{
    // Result set will be loaded lazily.
    Lazy = false
};
----
tab:C++[]
--

////
*TODO* Add tabs for ODBC and other programming languages - C# and C++
////

== Querying Colocated Data

When Ignite executes a distributed query, it sends sub-queries to individual cluster nodes to fetch the data and groups
the results on the reducer node (usually your application).
If you know in advance that the data you are querying is link:data-modeling/affinity-collocation[colocated]
by the `GROUP BY` condition, you can use `SqlFieldsQuery.collocated = true` to tell the SQL engine to do the grouping on the remote nodes.
This will reduce network traffic between the nodes and query execution time.
When this flag is set to `true`, the query is executed on individual nodes first and the results are sent to the reducer node for final calculation.

Consider the following example, in which we assume that the data is colocated by `department_id` (in other words, the
`department_id` field is configured as the affinity key).

[source,sql]
----
SELECT SUM(salary) FROM Employee GROUP BY department_id
----

Because of the nature of the SUM operation, Ignite will sum the salaries across the elements stored on individual nodes,
and then send these sums to the reducer node where the final result will be calculated.
This operation is already distributed, and enabling the `collocated` flag will only slightly improve performance.

Let's take a slightly different example:

[source,sql]
----
SELECT AVG(salary) FROM Employee GROUP BY department_id
----

In this example, Ignite has to fetch all (`salary`, `department_id`) pairs to the reducer node and calculate the results there.
However, if employees are colocated by the `department_id` field, i.e. employee data for the same department
is stored on the same node, setting `SqlFieldsQuery.collocated = true` will reduce query execution time because Ignite
will calculate the averages for each department on the individual nodes and send the results to the reducer node for final calculation.


== Enforcing Join Order

When this flag is set, the query optimizer will not reorder tables in joins.
In other words, the order in which joins are applied during query execution will be the same as specified in the query.
Without this flag, the query optimizer can reorder joins to improve performance.
However, sometimes it might make an incorrect decision.
This flag helps to control and explicitly specify the order of joins instead of relying on the optimizer.

Consider the following example:

[source, sql]
----
SELECT * FROM Person p
JOIN Company c ON p.company = c.name where p.name = 'John Doe'
AND p.age > 20
AND p.id > 5000
AND p.id < 100000
AND c.name NOT LIKE 'O%';
----

This query contains a join between two tables: `Person` and `Company`.
To get the best performance, we should understand which join will return the smallest result set.
The table with the smaller result set size should be given first in the join pair.
To get the size of each result set, let's test each part.

.Q1:
[source, sql]
----
SELECT count(*)
FROM Person p
where
p.name = 'John Doe'
AND p.age > 20
AND p.id > 5000
AND p.id < 100000;
----

.Q2:
[source, sql]
----
SELECT count(*)
FROM Company c
where
c.name NOT LIKE 'O%';
----

After running Q1 and Q2, we can get two different outcomes:

Case 1:
[cols="1,1",opts="stretch,autowidth",stripes=none]
|===
|Q1 | 30000
|Q2 |100000
|===

Q2 returns more entries than Q1.
In this case, we don't need to modify the original query, because smaller subset has already been located on the left side of the join.

Case 2:
[cols="1,1",opts="stretch,autowidth",stripes=none]
|===
|Q1 | 50000
|Q2 |10000
|===

Q1 returns more entries than Q2. So we need to change the initial query as follows:

[source, sql]
----
SELECT *
FROM Company c
JOIN Person p
ON p.company = c.name
where
p.name = 'John Doe'
AND p.age > 20
AND p.id > 5000
AND p.id < 100000
AND c.name NOT LIKE 'O%';
----

The force join order hint can be specified as follows:

* link:SQL/JDBC/jdbc-driver#parameters[JDBC driver connection parameter]
* link:SQL/ODBC/connection-string-dsn#supported-arguments[ODBC driver connection attribute]
* If you use link:SQL/sql-api[SqlFieldsQuery] to execute SQL queries, you can set the enforce join order
hint by calling the `SqlFieldsQuery.setEnforceJoinOrder(true)` method.


== Increasing Index Inline Size

Every entry in the index has a constant size which is calculated during index creation. This size is called _index inline size_.
Ideally this size should be enough to store full indexed entry in serialized form.
When values are not fully included in the index, Ignite may need to perform additional data page reads during index lookup,
which can impair performance if persistence is enabled.

//If a value type allows, Ignite includes indexed values in the index itself to optimize querying and data updates.


Here is how values are stored in the index:

// the source code block below uses css-styles from the pygments library. If you change the highlighting library, you should change the syles as well.
[source,java,subs="quotes"]
----
[tok-kt]#int#
0     1       5
| tag | value |
[tok-k]#Total: 5 bytes#

[tok-kt]#long#
0     1       9
| tag | value |
[tok-k]#Total: 9 bytes#

[tok-kt]#String#
0     1      3             N
| tag | size | UTF-8 value |
[tok-k]#Total: 3 + string length#

[tok-kt]#POJO (BinaryObject)#
0     1         5
| tag | BO hash |
[tok-k]#Total: 5#
----

For primitive data types (bool, byte, short, int, etc.), Ignite automatically calculates the index inline size so that the values are included in full.
For example, for `int` fields, the inline size is 5 (1 byte for the tag and 4 bytes for the value itself). For `long` fields, the inline size is 9 (1 byte for the tag + 8 bytes for the value).

For binary objects, the index includes the hash of each object, which is enough to avoid collisions. The inline size is 5.

For variable length data, indexes include only first several bytes of the value.
//As you can see, indexes on `Strings` (and other variable-length types) only store first several bytes of the value.
Therefore, when indexing fields with variable-length data, we recommend that you estimate the length of your field values and set the inline size to a value that includes most (about 95%) or all values.
For example, if you have a `String` field with 95% of the values containing 10 characters or fewer, you can set the inline size for the index on that field to 13.

//For example, when you create a table with a single column primary key, Ignite will automatically create an index on the primary key.

The inline sizes explained above apply to single field indexes.
However, when you define an index on a field in the value object or on a non-primary key column, Ignite creates a _composite index_
by appending the primary key to the indexed value.
Therefore, when calculating the inline size for composite indexes, add up the inline size of the primary key.

//To summarize, when creating indexes on a variable size data fields, choose the inline size to include most of the values that the field will hold. For other data types, Ignite will calculate the inline size automatically.

Below is an example of index inline size calculation for a cache where both key and value are complex objects.

[source, java]
----
public class Key {
    @QuerySqlField
    private long id;

    @QuerySqlField
    @AffinityKeyMapped
    private long affinityKey;
}

public class Value {
    @QuerySqlField(index = true)
    private long longField;

    @QuerySqlField(index = true)
    private int intField;

    @QuerySqlField(index = true)
    private String stringField; // we suppose that 95% of the values are 10 symbols
}
----

The following table summarizes the inline index sizes for the indexes defined in the example above.

[cols="1,1,1,2",opts="stretch,header"]
|===
|Index | Kind | Recommended Inline Size | Comment

| (_key)
|Primary key index
| 5
|Inlined hash of a binary object (5)

|(affinityKey, _key)
|Affinity key index
|14
|Inlined long (9) + binary object's hash (5)

|(longField, _key)
|Secondary index
|14
|Inlined long (9) + binary object's hash (5)

|(intField, _key)
|Secondary index
|10
|Inlined int (5) + binary object up to hash (5)

|(stringField, _key)
|Secondary index
|18
|Inlined string (13) + binary object's hash (5) (assuming that the string is {tilde}10 symbols)

|===
//_

//The inline size for the first two indexes is set via `CacheConfiguration.sqlIndexMaxInlineSize = 29` (because a single property is responsible for two indexes, we set it to the largest value).
//The inline size for the rest of the indexes is set when you define a corresponding index.
Note that you will only have to set the inline size for the index on `stringField`. For other indexes, Ignite will calculate the inline size automatically.

Refer to the link:SQL/indexes#configuring-index-inline-size[Configuring Index Inline Size] section for the information on how to change the inline size.

You can check the inline size of an existing index in the link:monitoring-metrics/system-views#indexes-view[INDEXES] system view.

[WARNING]
====
Note that since Ignite encodes strings to `UTF-8`, some characters use more than 1 byte.
====

== Query Parallelism

By default, a SQL query is executed in a single thread on each participating Ignite node. This approach is optimal for
queries returning small result sets involving index search. For example:

[source,sql]
----
SELECT * FROM Person WHERE p.id = ?;
----

Certain queries might benefit from being executed in multiple threads.
This relates to queries with table scans and aggregations, which is often the case for HTAP and OLAP workloads.
For example:

[source,sql]
----
SELECT SUM(salary) FROM Person;
----

The number of threads created on a single node for query execution is configured per cache and by default equals 1.
You can change the value by setting the `CacheConfiguration.queryParallelism` parameter.
If you create SQL tables using the CREATE TABLE command, you can use a link:configuring-caches/configuration-overview#cache-templates[cache template] to set this parameter.

If a query contains `JOINs`, then all the participating caches must have the same degree of parallelism.

== Index Hints

Index hints are useful in scenarios when you know that one index is more suitable for certain queries than another.
You can use them to instruct the query optimizer to choose a more efficient execution plan.
To do this, you can use `USE INDEX(indexA,...,indexN)` statement as shown in the following example.


[source,sql]
----
SELECT * FROM Person USE INDEX(index_age)
WHERE salary > 150000 AND age < 35;
----


== Partition Pruning

Partition pruning is a technique that optimizes queries that use affinity keys in the `WHERE` condition. When
executing such a query, Ignite will scan only those partitions where the requested data is stored. This will reduce
query time because the query will be sent only to the nodes that store the requested partitions.

In the following example, the employee objects are colocated by the `id` field (if an affinity key is not set
explicitly then the primary key is used as the affinity key):


[source,sql]
----
CREATE TABLE employee (id BIGINT PRIMARY KEY, department_id INT, name VARCHAR)

/* This query is sent to the node where the requested key is stored */
SELECT * FROM employee WHERE id=10;

/* This query is sent to all nodes */
SELECT * FROM employee WHERE department_id=10;
----

In the next example, the affinity key is set explicitly and, therefore, will be used to colocate data and direct
queries to the nodes that keep primary copies of the data:


[source,sql]
----
CREATE TABLE employee (id BIGINT PRIMARY KEY, department_id INT, name VARCHAR) WITH "AFFINITY_KEY=department_id"

/* This query is sent to all nodes */
SELECT * FROM employee WHERE id=10;

/* This query is sent to the node where the requested key is stored */
SELECT * FROM employee WHERE department_id=10;
----


[NOTE]
====
Refer to link:data-modeling/affinity-collocation[affinity colocation] page for more details
on how data gets colocated and how it helps boost performance in distributed storages like Ignite.
====

== Skip Reducer on Update

When Ignite executes a DML operation, it first fetches all the affected intermediate rows for analysis to the reducer
node (usually your application), and only then prepares batches of updated values that will be sent to remote nodes.

This approach might affect performance and saturate the network if a DML operation has to move many entries.

Use this flag as a hint for the SQL engine to do all intermediate rows analysis and updates “in-place” on the server nodes.
The hint is supported for JDBC and ODBC connections.


[tabs]
--
tab:JDBC Connection String[]
[source,text]
----
//jdbc connection string
jdbc:ignite:thin://192.168.0.15/skipReducerOnUpdate=true
----
--


////
*TODO* Add tabs for ODBC and other programming languages - C# and C++
////

== SQL On-heap Row Cache

Ignite stores data and indexes in its own memory space outside of Java heap. This means that with every data
access, a part of the data will be copied from the off-heap space to Java heap, potentially deserialized, and kept in
the heap as long as your application or server node references it.

The SQL on-heap row cache is intended to store hot rows (key-value objects) in Java heap, minimizing resources
spent for data copying and deserialization. Each cached row refers to an entry in the off-heap region and can be
invalidated when one of the following happens:

* The master entry stored in the off-heap region is updated or removed.
* The data page that stores the master entry is evicted from RAM.

The on-heap row cache can be enabled for a specific cache/table (if you use CREATE TABLE to create SQL tables and caches,
then the parameter can be passed via a link:configuring-caches/configuration-overview#cache-templates[cache template]):

////
TODO Add tabs for ODBC/JDBC and other programming languages - Java C# and C++
////

[source,xml]
----
include::code-snippets/xml/sql-on-heap-cache.xml[tags=ignite-config;!discovery,indent=0]
----

////
*TODO* Add tabs for ODBC/JDBC and other programming languages - Java C# and C++
////

If the row cache is enabled, you might be able to trade RAM for performance. You might get up to a 2x performance increase for some SQL queries and use cases by allocating more RAM for rows caching purposes.

[WARNING]
====
[discrete]
=== SQL On-Heap Row Cache Size

Presently, the cache is unlimited and can occupy as much RAM as allocated to your memory data regions. Make sure to:

* Set the JVM max heap size equal to the total size of all the data regions that store caches for which this on-heap row cache is enabled.

* link:perf-and-troubleshooting/memory-tuning#java-heap-and-gc-tuning[Tune] JVM garbage collection accordingly.
====

== Using TIMESTAMP instead of DATE

//TODO: is this still valid?
Use the `TIMESTAMP` type instead of `DATE` whenever possible. Presently, the `DATE` type is serialized/deserialized very
inefficiently resulting in performance degradation.
